HBO

Ashley Wright & Mubeena Wahaj

2023-04-13

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 1.0.0 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(shiny)
## Warning: package 'shiny' was built under R version 4.2.3
library(dplyr)

Lights, camera, action!

{centered}

Today, we’re going to take a deep dive into the world of HBO movies and TV shows. From the iconic dramas like The Sopranos and Game of Thrones to the latest releases, HBO has been providing quality content to its viewers for decades. But have you ever wondered how they make decisions about what shows to produce or which movies to acquire? That’s where the fascinating world of HBO data comes into play. By analyzing audience trends, ratings, and viewer demographics, HBO can make informed decisions about what to offer to its loyal fans. So sit back, grab some popcorn, and get ready to explore the exciting world of HBO data.

The data we’ve decided to work on is from kaggle, owned by Diego Enrique and here’s the link https://www.kaggle.com/datasets/dgoenrique/hbo-max-movies-and-tv-shows

Let us read our datas, shall we?

credits <- read.csv("credits.csv", stringsAsFactors = FALSE)
titles = read.csv("titles.csv", stringsAsFactors = FALSE)
glimpse(credits)
## Rows: 64,879
## Columns: 5
## $ person_id <int> 14701, 14702, 14703, 14704, 14705, 14706, 1367, 14716, 14707…
## $ id        <chr> "tm77588", "tm77588", "tm77588", "tm77588", "tm77588", "tm77…
## $ name      <chr> "Humphrey Bogart", "Ingrid Bergman", "Paul Henreid", "Claude…
## $ character <chr> "Rick Blaine", "Ilsa Lund", "Victor Laszlo", "Captain Louis …
## $ role      <chr> "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR…
glimpse(titles)
## Rows: 3,030
## Columns: 15
## $ id                   <chr> "tm77588", "tm155702", "tm83648", "tm3175", "ts22…
## $ title                <chr> "Casablanca", "The Wizard of Oz", "Citizen Kane",…
## $ type                 <chr> "MOVIE", "MOVIE", "MOVIE", "MOVIE", "SHOW", "MOVI…
## $ description          <chr> "In Casablanca, Morocco in December 1941, a cynic…
## $ release_year         <int> 1943, 1939, 1941, 1945, 1940, 1940, 1946, 1934, 1…
## $ age_certification    <chr> "PG", "G", "PG", "", "", "G", "", "", "", "PG-13"…
## $ runtime              <int> 102, 102, 119, 113, 8, 238, 114, 93, 111, 109, 12…
## $ genres               <chr> "['drama', 'romance', 'war']", "['fantasy', 'fami…
## $ production_countries <chr> "['US']", "['US']", "['US']", "['US']", "['US']",…
## $ seasons              <dbl> NA, NA, NA, NA, 16, NA, NA, NA, NA, NA, NA, NA, N…
## $ imdb_id              <chr> "tt0034583", "tt0032138", "tt0033467", "tt0037059…
## $ imdb_score           <dbl> 8.5, 8.1, 8.3, 7.5, 7.7, 8.2, 7.9, 7.9, 7.9, 8.3,…
## $ imdb_votes           <dbl> 577842, 406105, 446627, 25589, 859, 319463, 87289…
## $ tmdb_popularity      <dbl> 22.005, 56.631, 19.900, 8.311, 1.400, 27.535, 11.…
## $ tmdb_score           <dbl> 8.167, 7.583, 8.022, 7.000, 10.000, 8.000, 7.700,…

Whoops! let’s make it a little more readable

head(credits)
##   person_id      id               name               character  role
## 1     14701 tm77588    Humphrey Bogart             Rick Blaine ACTOR
## 2     14702 tm77588     Ingrid Bergman               Ilsa Lund ACTOR
## 3     14703 tm77588       Paul Henreid           Victor Laszlo ACTOR
## 4     14704 tm77588       Claude Rains   Captain Louis Renault ACTOR
## 5     14705 tm77588       Conrad Veidt Major Heinrich Strasser ACTOR
## 6     14706 tm77588 Sydney Greenstreet          Signor Ferrari ACTOR
head(titles)
##         id                title  type
## 1  tm77588           Casablanca MOVIE
## 2 tm155702     The Wizard of Oz MOVIE
## 3  tm83648         Citizen Kane MOVIE
## 4   tm3175 Meet Me in St. Louis MOVIE
## 5 ts225761        Tom and Jerry  SHOW
## 6 tm156463   Gone with the Wind MOVIE
##                                                                                                                                                                                                                                                                                                                                                                                             description
## 1                                                                                                                                                                                                                                                                           In Casablanca, Morocco in December 1941, a cynical American expatriate meets a former lover, with unforeseen complications.
## 2                                                                                   Young Dorothy finds herself in a magical world where she makes friends with a lion, a scarecrow and a tin man as they make their way along the yellow brick road to talk with the Wizard and ask for the things they miss most in their lives. The Wicked Witch of the West is the only thing that could stop them.
## 3                                                                                                        Newspaper magnate, Charles Foster Kane is taken from his mother as a boy and made the ward of a rich industrialist. As a result, every well-meaning, tyrannical or self-destructive move he makes for the rest of his life appears in some way to be a reaction to that deeply wounding event.
## 4                                                                                                                                                                                                                                   In the year before the 1904 St. Louis World's Fair, the four Smith daughters learn lessons of life and love, even as they prepare for a reluctant move to New York.
## 5 Tom and Jerry is an American animated franchise and series of comedy short films created in 1940 by William Hanna and Joseph Barbera. Best known for its 161 theatrical short films by Metro-Goldwyn-Mayer, the series centers on a friendship/rivalry (a love-hate relationship) between the title characters Tom, a cat, and Jerry, a mouse. Many shorts also feature several recurring characters.
## 6                                                                                                                                                                                                                                       The spoiled daughter of a Georgia plantation owner conducts a tumultuous romance with a cynical profiteer during the American Civil War and Reconstruction Era.
##   release_year age_certification runtime
## 1         1943                PG     102
## 2         1939                 G     102
## 3         1941                PG     119
## 4         1945                       113
## 5         1940                         8
## 6         1940                 G     238
##                                              genres production_countries
## 1                       ['drama', 'romance', 'war']               ['US']
## 2                             ['fantasy', 'family']               ['US']
## 3                                         ['drama']               ['US']
## 4 ['drama', 'family', 'romance', 'music', 'comedy']               ['US']
## 5       ['animation', 'comedy', 'family', 'action']               ['US']
## 6            ['drama', 'romance', 'war', 'history']               ['US']
##   seasons   imdb_id imdb_score imdb_votes tmdb_popularity tmdb_score
## 1      NA tt0034583        8.5     577842          22.005      8.167
## 2      NA tt0032138        8.1     406105          56.631      7.583
## 3      NA tt0033467        8.3     446627          19.900      8.022
## 4      NA tt0037059        7.5      25589           8.311      7.000
## 5      16 tt6422744        7.7        859           1.400     10.000
## 6      NA tt0031381        8.2     319463          27.535      8.000

Firstly let’s see how many movies and TV shows we are dealing with

titles %>% 
  count(type)
##    type    n
## 1 MOVIE 2408
## 2  SHOW  622

Wow! that’s a lot more movies than shows!

Now let’s see what are the top 10 most popular movies and show from imbd and tmdb

top_5_ratings = titles %>%
  arrange(desc(imdb_score)) %>%
  select(title, type, release_year, genres, ) %>%
  head(5)

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.